# 42 Orthogonal sets and orthogonal matrices.

Date modified: Mon 2024-05-20 . 03:24 PM
Date created: Sun 2024-05-19 . 09:55 AM
# 42 Orthogonal sets and orthogonal matrices. Let us describe a very nice kind of basis (orthonormal basis) and a very special matrix called orthogonal matrices. ## Orthogonal sets and orthogonal basis sets. We say a set of vectors $S = \{v_{1},v_{2},\ldots,v_{\ell}\}$ from $\mathbb R^{n}$ is an **orthogonal set** if the vectors are pairwise orthogonal to each other, that is $$ v_{i} \cdot v_{j} = 0 \quad\text{whenever $i\neq j$}. $$ We further say a set of vectors $S = \{v_{1},v_{2},\ldots,v_{\ell}\}$ is an **orthonormal set** if it is an orthogonal set _and_ that each vector has length $1$, that is $$ v_{i} \cdot v_{j} = \delta_{ij} = \begin{cases}0 & \text{if $i\neq j$} \\ 1 & \text{if $i=j$.} \end{cases} $$ Here $\delta_{ij}$ denotes the _Kronecker delta_, where $\delta_{ij}= 0$ when $i\neq j$, and $\delta_{ij}=1$ if $i=j$. **Example.** The set of vectors $$ S = \{\begin{pmatrix}1\\2\\1\\2\end{pmatrix}, \begin{pmatrix}2\\-1\\2\\-1\end{pmatrix},\begin{pmatrix}1\\0\\-1\\0\end{pmatrix}\} $$is an orthogonal set. Indeed, if we check all pairs of vectors whose dot product we can take, we see that $$ \begin{align*} \begin{pmatrix}1\\2\\1\\2\end{pmatrix} \cdot \begin{pmatrix}2\\-1\\2\\-1\end{pmatrix} = 0, \ \ \begin{pmatrix}1\\2\\1\\2\end{pmatrix} \cdot \begin{pmatrix}1\\0\\-1\\0\end{pmatrix} = 0 ,\ \ \begin{pmatrix}2\\-1\\2\\-1\end{pmatrix} \cdot \begin{pmatrix}1\\0\\-1\\0\end{pmatrix} = 0. \end{align*} $$ However, $S$ here is not an orthonormal set, since the length of each vector is not all $1$. We can produce an orthonormal set out of $S$ by normalizing each vector, however. So $$ \left\{ \frac{1}{\sqrt{10}} \begin{pmatrix}1\\2\\1\\2\end{pmatrix}, \frac{1}{\sqrt{10}}\begin{pmatrix}2\\-1\\2\\-1\end{pmatrix}, \frac{1}{\sqrt{2}} \begin{pmatrix}1\\0\\-1\\0\end{pmatrix} \right\} $$is an orthonormal set, obtained by normalizing each vector of $S$. $\blacksquare$ Why do we care about orthogonal sets? We have linear independence "for free", provided that there is no zero vector in the set: > **Proposition.** > Let $S = \{v_{1},v_{2},\ldots,v_{\ell}\}$ be an orthogonal set such that each vector $v_{i} \neq \vec0$. Then $S$ is _linearly independent_. > > In particular $S$ forms a basis for the linear space $W = \operatorname{span}(S)$. Also, since an orthonormal set has no zero vectors, an orthonormal set is automatically linearly independent. $\blacktriangleright$ Suppose $S = \{v_{1},v_{2},\ldots,v_{\ell}\}$ is an orthogonal set where each vector is not the zero vector, $v_{i}\neq \vec 0$. Now if we have the linear combination where $$ c_{1}v_{1}+c_{2}v_{2}+\cdots + c_{\ell}v_{\ell} = \vec 0, $$note that by taking the dot product of the above expression by $v_{i}$, we see that $$ \begin{array}{} v_{i} \cdot (c_{1} v_{1}+c_{2}v_{2}+\cdots + c_{\ell}v_{\ell}) = v_{i}\cdot \vec 0 \\ \implies c_{i} \underbrace{v_{i} \cdot v_{i}}_{\neq 0} = 0 \\ \implies c_{i} = 0, \end{array} $$where $v_{i}\cdot v_{i}\neq 0$ as $v_{i}$ is not the zero vector. This shows each coefficient $c_{i}= 0$, hence the set $S =\{v_{1},v_{2},\ldots,v_{\ell}\}$ is linearly independent. $\blacksquare$ As they are automatically linearly independent, orthogonal sets without zero vectors are suitable as basis sets for a linear space. Why do we care about having an orthogonal set or orthonormal set as a basis set for a linear space? One reason is that it is easy to find the coefficients of a vector as a linear combination of an orthogonal basis set. > **Coefficients of a linear combination over an orthogonal basis set.** > Let $W$ be a linear space with an orthogonal basis $\beta = \{v_{1},v_{2},\ldots,v_{\ell}\}$, then for any $x\in W$, we can write $x = c_{1}v_{1}+c_{2}v_{2}+\cdots +c_{\ell}v_{\ell}$ where $$ c_{i} = \frac{v_{i}\cdot x}{v_{i}\cdot v_{i}}. $$ $\blacktriangleright$ Since $\beta = \{v_{1},v_{2},\ldots,v_{\ell}\}$ is a basis for $W$, every vector $x\in W$ can be written as a linear combination over $v_{1},v_{2}\ldots,v_{\ell}$, $$ x = c_{1} v_{1} + c_{2} v_{2} + \cdots + c_{\ell} v_{\ell}. $$To compute the coefficients $c_{i}$ we take the dot product of $x$ with $v_{i}$, which gives $$ \begin{array}{} v_{i} \cdot x = v_{i} \cdot (c_{1}v_{1}+c_{2}v_{2}+\cdots + c_{\ell} v_{\ell}) \\ v_{i} \cdot x = c_{i}(v_{i}\cdot v_{i}) \\ c_{i} = \displaystyle \frac{v_{i} \cdot x}{v_{i}\cdot v_{i}}. \quad\blacksquare \end{array} $$ **Example.** Let $\beta = \{v_{1},v_{2},v_{3}\}$ be an orthogonal basis of $\mathbb R^{3}$ where $$ v_{1} = \begin{pmatrix}1\\1\\1\end{pmatrix}, v_{2}=\begin{pmatrix}1\\0\\-1\end{pmatrix},v_{3}=\begin{pmatrix}1\\-2\\1\end{pmatrix} $$(check it). Express the vector $x = \begin{pmatrix}1\\2\\3\end{pmatrix}$ as a linear combination of the basis vectors in $\beta$. $\blacktriangleright$ Since $\beta$ is an orthogonal basis, we can compute the coefficients of $x$ with respect to $\beta$ by taking dot product: $$ \begin{array}{l}\displaystyle c_{1} = \frac{v_{1} \cdot x}{v_{1}\cdot v_{1}}= \frac{6}{3} = 2 \\ \displaystyle c_{2} = \frac{v_{2}\cdot x}{v_{2}\cdot v_{2}} = \frac{-2}{2}=-1 \\ \displaystyle c_{3} = \frac{v_{3}\cdot x}{v_{3}\cdot v_{3}} = \frac{0}{6}= 0 \end{array} $$So $x = 2 v_{1} - v_{2}+0v_{3}$. $\blacksquare$ **Remark.** As it turns out, every subspaces $W$ of $\mathbb R^{n}$ has an orthonormal basis, and we can construct such orthonormal basis by performing something called _Gram-Schmidt process_ on an existing basis of $W$. **For the future remark.** Perhaps more curiously, the fact that a subspace $W$ of $\mathbb R^{n}$ is finite dimensional guarantees $W$ has an orthonormal basis. As it turns out, there are infinite dimensional inner product spaces that do not have an orthonormal basis, in the linear algebra sense, but they may have something called a _complete orthonormal system_, which can approximate any vector in the space via taking limits. This is how Fourier series for functions on closed intervals work! ## Matrices with orthonormal columns, orthogonal matrices. Suppose $Q$ is an $n\times k$ matrix whose $k$ columns are orthonormal vectors from $\mathbb R^{n}$, namely $$ Q = \begin{bmatrix} | & | & & |\\u_{1} & u_{2} & \cdots & u_{k}\\| & | & & |\end{bmatrix} $$where $u_{i}\cdot u_{j} = \delta_{ij}$. Note $Q$ does not have to be square, but we must have $k \le n$ as this orthonormal sets are linearly independent and so the number of columns cannot exceed $n$. An interesting property of $Q$ is that $$ Q^{T} Q = I_{k\times k} $$which can be seen by working out the entries of $Q^{T}Q$, $$ Q^{T}Q = \begin{bmatrix} \frac{\quad}{} & u_{1}^{T} & \frac{\quad}{} \\ \frac{\quad}{} & u_{2}^{T} & \frac{\quad}{} \\ & \vdots \\ \frac{\quad}{} & u_{k}^{T} & \frac{\quad}{} \\ \end{bmatrix}\begin{bmatrix} | & | & & |\\u_{1} & u_{2} & \cdots & u_{k}\\| & | & & |\end{bmatrix} = \begin{bmatrix}u_{1}^{T}u_{1} & u_{1}^{T}u_{2} & \cdots & u_{1}^{T}u_{k} \\ u_{2}^{T}u_{1} & u_{2}^{T}u_{2} & \cdots & u_{2}^{T}u_{k} \\ \vdots & \vdots & & \vdots\\ u_{k}^{T}u_{1} & u_{k}^{T}u_{2} & \cdots & u_{k}^{T}u_{k} \end{bmatrix} $$And since $u_{i}^{T}u_{j} = u_{i} \cdot u_{j} = \delta_{ij}$ as the columns $u_{1},u_{2},\ldots,u_{k}$ form an orthonormal set, we see that $Q^{T}Q = I_{k\times k}$. In fact, we have > **Characterization for having orthonormal columns.** > Let $Q$ be any $n\times k$ matrix (not necessarily square). Then $$ Q^{T}Q = I_{k\times k} \iff Q\text{ has orthonormal columns.} $$ This is again because the $(i,j)$-th entry of $Q^{T}Q$ is $$ [Q^{T}Q]_{ij}=u_{i}^{T}u_{j} $$where $u_{i}$ is the $i$-th column of $Q$. So when $u_{i}^{T}u_{j} = \delta_{ij}$, we have orthonormal columns. **Remark.** This is useful to check or to prove a matrix $A$ has orthonormal columns or not, compute $A^{T}A$ and see if it an identity matrix. **Remark.** A question now is, what about the $n\times n$ matrix $QQ^{T}$ then? We leave this as a mystery for now. Hint: It is an ??th?????? ??j????? ! When $Q$ is a _square matrix with orthonormal columns_, we call $Q$ an **orthogonal matrix**. Yes, terminology is weird I know, no need to tell me. It's been like that forever so we are stuck with it. You can blame Frobenius for defining it so. We record some properties of orthogonal matrices $Q$. > **Algebraic properties of an orthogonal matrix.** > If $Q$ is an $n\times n$ orthogonal matrix, then > (1) $Q$ is invertible. > (2) $\det(Q)= \pm1$. > (3) $Q^{-1}=Q^{T}$. > (4) $Q^{T}Q = QQ^{T} = I_{n\times n}$. > (5) $Q$ has orthonormal rows. $\blacktriangleright$ Since $Q$ is a matrix of orthonormal columns, we known $Q^{T}Q = I_{n\times n}$, which shows (3) $Q^{-1}=Q^{T}$ as $Q$ is square. This shows (1) $Q$ is invertible and (4) $QQ^{T} = I_{n\times n}$. And since, $1=\det(I)=\det(Q^{T}Q)=\det(Q)\det(Q^{T})=\det(Q)^{2}$, so $\det(Q)=\pm1$. Finally, since $QQ^{T}=I_{n\times n}$, we see that the $i$-th row of $Q$ times the $j$-th column of $Q$ gives $\delta_{ij}$. In other words, $$ (i\text{-th row of } Q)\cdot (j\text{-th row of } Q)=\delta_{ij} $$ So the rows of $Q$ are also orthonormal! $\blacksquare$ **Remark.** An remarkable observation is that an orthogonal matrix (square matrix whose columns forms an orthonormal set) is always invertible, and _its inverse is just its own transpose_! **Examples of $2\times 2$ orthogonal matrices** A rotation matrix (rotate counterclockwise by angle $\theta$) $$ R = \begin{pmatrix}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{pmatrix} $$is an orthogonal matrix. And note $R^{-1}=R^{T}$, which is rotation by $-\theta$ counterclockwise. A reflection matrix (reflect across a line of angle $\theta$) $$ F = \begin{pmatrix}\cos2\theta & \sin2\theta \\ \sin2\theta & -\cos2\theta\end{pmatrix} $$ is also an orthogonal matrix, and $F^{-1}=F^{T}$, which in this case is itself $F$. This makes sense because reflection is its own inverse! > **Geometric properties of an orthogonal matrix.** > If $Q$ is an $n\times n$ orthogonal matrix, then > (1) $Q$ preserves length: $\Vert Qx\Vert =\Vert x\Vert$ for all $x\in \mathbb R^{n}$. > (2) $Q$ preserves orthogonality: If $x\cdot y= 0$, then $(Qx)\cdot(Qy)=0$, for $x,y\in\mathbb R^{n}$. > (3) $Q$ preserves angle: If the angle between $x$ and $y$ is $\theta$, so is the angle between $Qx$ and $Qy$. > (4) $Q$ preserves dot product: For any $x,y\in \mathbb R^{n}$, we have $(Qx)\cdot(Qy) = x\cdot y$. $\blacktriangleright$ We demonstrate (4) by using the matrix product formulation of dot product. Note $$ (Qx)\cdot (Qy) = (Qx)^{T}(Qy)=x^{T}Q^{T}Qy=x^{T}Iy=x^{T}y = x\cdot y. $$Hence $Q$ preserves dot product. You can show (1), (2), (3) for homework, using the same idea as (4). $\blacksquare$ Here is another nice property of orthogonal matrices, besides its inverse is its own transpose. > **The set of $n\times n$ orthogonal matrices is closed under matrix product.** > If $A$ and $B$ are two $n\times n$ orthogonal matrices, then so is their product $AB$. $\blacktriangleright$ Note $A,B$ are both square of size $n\times n$, so $AB$ is also square of size $n\times n$. To show $AB$ has orthonormal columns, we compute $(AB)^{T}(AB)$ and we wish to show it is the identity matrix. Note that $$ (AB)^{T}(AB)=B^{T}A^{T}AB = B^{T}(A^{T}A)B = B^{T}IB = B^{T}B = I, $$by using the fact that both $A^{T}A=I$ and $B^{T}B=I$, as they are orthogonal matrices themselves. Hence $AB$ is also an orthogonal matrix! $\blacksquare$ Of course, we do not expect orthogonal matrices to be closed under sum. **Remarks.** - Since an orthogonal matrix $Q$ preserves lengths and angles, it is intuitively a _rigid_ linear transformation, namely some kind of combination of rotation and reflection. In particular, - when it is an orientation preserving rotation, it has positive determinant $+1$, - while when it is an orientation reversing reflection, it has negative determinant $-1$. - An orthogonal matrix is always invertible, and later we will use it in matrix factorization when we talk about real spectral theorem: - For real symmetric $A$ we always have $A = QDQ^{T}$, where $D$ diagonal and $Q$ orthogonal. - A fancy word for preserving lengths and angle is _conformal_. Also, to review the terminology. - If $Q$ is an $n\times k$ matrix with orthonormal columns ($k\le n$), there is no special name for it, it is just a _matrix with orthonormal columns_. Here we have $Q^{T}Q = I_{k\times k}$, however $QQ^{T}$ is some $n\times n$ matrix that we will investigate! - It is sufficient to show that $A^{T} A=I$ to show $A$ has orthonormal columns. - If $Q$ is an $n\times n$ square matrix with orthonormal columns, then we call $Q$ an _orthogonal matrix_. Here we have both $Q^{T}Q=QQ^{T}=I_{n\times n}$. //// Mystery revealed?? //// Answer: When $Q$ has orthonormal columns, $QQ^{T}$ is a an orthogonal projection matrix! We will talk about orthogonal projection next! ////